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Purpose: We investigated a small-group intervention 
designed to teach vocabulary and comprehension 
skills to preschoolers who were at risk for language 
and reading disabilities. These language skills are 
important and reliable predictors of later academic 
achievement. 

Method: Preschoolers heard prerecorded stories 3 times 
per week over the course of a school year. A cluster 
randomized design was used to evaluate the effects of 
hearing storybooks with and without embedded vocabulary 
and comprehension lessons. A total of 32 classrooms 
were randomly assigned to experimental and comparison 
conditions. Approximately 6 children per classroom 


ral language skills have been established as an 
important and reliable predictor of later reading 
achievement (National Early Literacy Panel, 
2008). Early language proficiency is correlated with reading 
ability, in particular with reading comprehension (Storch 
& Whitehurst, 2002). Many children begin school with lim¬ 
ited language ability, placing them at high risk of diagnosis 
with reading disability and with later reading failure (Bishop 
& Adams, 1990; Catts, Fey, Tomblin, & Zhang, 2002). 
Many of these children are from families with low income 
and begin school with language deficits relative to middle- 
class peers (Floff, 2013). For example, 33% of children en¬ 
rolled in Flead Start were found to have language delays, with 
scores more than 1 SD below the mean on norm-referenced 
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demonstrating low vocabulary knowledge, totaling 195 children, 
were enrolled. 

Results: Preschoolers in the comparison condition did not 
learn novel, challenging vocabulary words to which they 
were exposed in story contexts, whereas preschoolers 
receiving embedded lessons demonstrated significant 
learning gains, although vocabulary learning diminished over 
the course of the school year. Modest gains in comprehension 
skills did not differ between the two groups. 

Conclusion: The Story Friends curriculum appears to be 
highly feasible for delivery in early childhood educational 
settings and effective at teaching challenging vocabulary to 
high-risk preschoolers. 


measures (Nelson, Welsh, Trup, & Greenberg, 2011). 
Within the domain of language, vocabulary in particular 
seems to be related to socioeconomic status: Children 
from low-income families tend to have vocabularies that 
are much smaller than children from middle-income families 
(Flart & Risley, 1995; Qi, Kaiser, Milan, & Flancock, 
2006). These early language deficits have long-term effects 
on academic achievement (Walker, Greenwood, Flart, & 
Carta, 1994). 

Early childhood education presents an opportunity to 
address these language deficits. Preschool experiences with 
vocabulary predict later reading comprehension (Dickinson 
& Porche, 2011). It is unfortunate that improving the lan¬ 
guage experiences of early childhood settings can be challeng¬ 
ing (Dickinson, 2011). Given the wide range of language 
skills of preschoolers, it is critical to adapt instruction to the 
needs of all children. Multitiered system of supports (MTSS) 
is an increasingly prevalent approach to achieving that goal 
(Berkeley, Bender, Gregg Peaster, & Saunders, 2009). In 
MTSS models, all children receive Tier 1 classroom instruc¬ 
tion focused on important academic skills. Children are 
screened periodically to identify those who are falling behind 
developmentally. Instruction, often categorized into Tier 2 
and Tier 3 levels of support, is provided as appropriate 
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to levels of need. Progress is monitored so that adjustments 
can be made in the tiers of support children receive. There 
is evidence that MTSS can be effective in improving child 
outcomes (Burns, Appleton, & Stehouwer, 2005), and a 
few studies have examined MTSS in early childhood set¬ 
tings (Gettinger & Stoiber, 2008; VanDerHeyden, Snyder, 
Broussard, & Ramsdell, 2008; VanDerHeyden, Witt, & 
Gilbertson, 2007). To effectively implement MTSS to ad¬ 
dress language deficits, there is a need for efficacious Tier 2 
and Tier 3 interventions. Tier 2 interventions typically 
supplement Tier 1 instruction and often are delivered by 
classroom staff in small groups. Because many children in 
a classroom may require Tier 2 language intervention 
(Greenwood et al., 2013), it is especially important that in¬ 
terventions can be delivered with high fidelity using existing 
classroom resources. 

Tiered vocabulary programs have been implemented 
with preschoolers and kindergartners (Loftus, Coyne, 
McCoach, & Zipoli, 2010; Pullen, Tuckwiller, Konold, 
Maynard, & Coyne, 2010; Zucker, Solari, Landry, & 
Swank, 2013). These studies found that at-risk students made 
greater vocabulary gains when a second tier of instruction 
was added. This may not be surprising, as observational 
studies have found limited instructional support in early 
childhood settings (Greenwood et al., 2013). MTSS research 
in the primary grades can inform the design of Tier 2 
interventions to maximize learning (Gersten et al., 2008). 
Effective instruction needs to be explicit, teach important 
instructional targets, and support learning through scaf¬ 
folding and many opportunities to respond (Foorman & 
Torgesen, 2001; Snow, Burns, & Griffin, 1998). Vocabu¬ 
lary interventions that incorporate some of these components 
have produced moderate to large effects on proximal mea¬ 
sures of student learning (Beck & McKeown, 2007; Coyne, 
McCoach, & Kapp, 2007; Johnson & Yeates, 2006; Loftus 
et al., 2010; Marulis & Neuman, 2010; Mol, Bus, & de Jong, 
2009; Neuman, Newman, & Dwyer, 2011). 

Tier 2 instruction needs to be characterized by high 
fidelity and high feasibility to sustain implementation in 
authentic early childhood education settings. Fidelity of 
implementation—the extent to which a treatment is imple¬ 
mented as intended—is a critical consideration for effective 
intervention. Researchers have found that higher levels of 
fidelity produce stronger treatment effects (Pas & Bradshaw, 
2012; Zucker et al., 2013). When intervention is imple¬ 
mented by educational staff, fidelity of implementation 
is often lower than when intervention is implemented by 
researchers (Hulleman & Cordray, 2009; Mol et al, 2009). 
Dickinson (2011) reported fidelity of just 38% of instruc¬ 
tional elements during a shared storybook reading inter¬ 
vention following teacher training. Other research groups 
have reported similarly discouraging findings (Justice, 
Mashburn, Hamre, & Pianta, 2008). Within a particular 
treatment, particular components may be more or less easy 
to implement in classrooms. Adherence to procedures of 
an intervention (e.g., providing specific activities, using 
prescribed materials) is frequently high, whereas the use 
of instructional techniques and processes (e.g., language 


modeling) is much lower (Justice et al, 2008; Pence, Justice, 
& Wiggins, 2008). Zucker et al. (2013) found that imple¬ 
mentation fidelity was higher for the first tier of instruction 
than for the second tier. The authors hypothesized that the 
shared storybook reading context and strategies utilized 
in Tier 1 were familiar to teachers and thus easy to im¬ 
plement with fidelity. The explicit, extended vocabulary 
teaching expected in Tier 2 was less familiar and more 
challenging. Consideration of implementation feasibility 
prompted the development and evaluation of Story Friends, 
a Tier 2 pre-K curriculum that delivers key instructional 
components using automated prerecorded lessons. 

Goldstein and colleagues conducted a series of studies 
to develop, refine, and examine the effects of the Story 
Friends curriculum, a Tier 2 intervention in which vocabu¬ 
lary and comprehension instruction is embedded in spe¬ 
cially designed storybooks and prerecorded (Greenwood 
et al., in press; Kelley, Goldstein, Spencer, & Sherman, 
2015; Spencer, Goldstein, Sherman, et al., 2012). A brief 
description of the program is provided below; additional 
discussion of the curriculum design and details of the iterative 
development process are available in Kelley and Goldstein 
(2014). 

Story Friends is a supplemental preschool program 
designed with feasibility of high implementation fidelity. 
Rather than a Tier 2 program specific to a single early 
childhood curriculum, the program is designed to accom¬ 
pany a variety of Tier 1 curricula. The program targets 
two important skills within the domain of oral language: 
vocabulary knowledge and language comprehension. To 
address vocabulary knowledge, sophisticated vocabulary 
words were targeted using explicit instruction. Vocabulary 
targets were selected by applying Beck and McKeown’s 
(2007; Beck, McKeown, & Kucan, 2002) recommenda¬ 
tions to teach challenging, high-utility vocabulary words 
that occur frequently in the language of adult language 
users and in written text. Beck and colleagues argued that 
these sophisticated vocabulary words, rather than commonly 
used words likely to be acquired without instruction, 
are the most appropriate targets for explicit instruction. 

For the Story Friends program, we applied the following 
criteria to select words: (a) words that were unlikely to 
be familiar to preschool children with limited vocabulary; 
(b) words that were likely to occur frequently in sophis¬ 
ticated spoken and written language; (c) words that could 
be defined with a simple, child-friendly explanation; and 
(d) words that could be supported in the story context 
with explicit instruction (Spencer, Goldstein, & Kaminski, 
2012). Systematic, explicit instruction for each vocabulary 
target was designed to include a reference to the story 
context; a simple, child-friendly definition; and connections 
between the word’s meaning and children’s everyday ex¬ 
periences. A recorded narrator provided several opportuni¬ 
ties for children to say the word and the definition and then 
modeled correct responses. Many of these components of 
explicit instruction have been identified in other vocabulary 
intervention studies (e.g.. Beck & McKeown, 2007; Coyne 
et al, 2007). 
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To address language comprehension, Story Friends 
targets answering questions about stories. Our preliminary 
studies of Story Friends found that most preschoolers 
were able to answer literal questions. Therefore, inferential 
questions for which the answer was not explicitly stated 
in the story text were the focus of instruction. Inferential 
language has been shown to contribute to reading com¬ 
prehension (Cain, Oakhill, & Bryant, 2004; Cain, Oakhill, 

& Lemmon, 2004). Inferential questions included poststory 
predictions (e.g., “Do you think the Jungle Friends will 
go to the beach again? Why/why not?”), questions about 
character emotions (e.g., “Why is Ellie happy?”), and ques¬ 
tions about character actions (e.g., “Why did Leo help 
Marquez?”). Embedded lessons for story questions included 
a model of an appropriate response and a “think aloud” 
explanation of the response. This approach to teaching infer¬ 
ential language in the context of storybooks was informed 
by the work of van Kleeck and colleagues (van Kleeck, 
2006, 2008; van Kleeck, van der Woude, & Hammett, 2006). 
Appendix 1 includes sample scripts for vocabulary and com¬ 
prehension lessons. 

To ensure high-fidelity implementation in classroom 
settings. Story Eriends makes use of an innovative automated 
approach. Stories and embedded lessons are prerecorded; 
children listen through headphones in small groups and fol¬ 
low along in accompanying storybooks. The automated 
approach ensures that key elements of the carefully designed, 
explicit instruction (e.g., exposure to instructional targets, 
systematic instructional language, multiple opportunities to 
respond) are delivered consistently. Rather than requiring 
extensive professional development or dedicated resources. 
Story Friends requires minimal training of educational staff 
and can be incorporated into a common context in early 
childhood classrooms (e.g., the listening center). Procedural 
fidelity (e.g., setting up materials) was the responsibility 
of educational staff, whereas fidelity of instructional compo¬ 
nents was accomplished via the automated instruction. 

The efficacy of the automated approach incorporated 
in Story Friends has been examined in a series of studies 
(Greenwood et al., in press; Kelley et al., 2015; Spencer, 
Goldstein, Sheiman, et al., 2012). Across studies, large effect 
sizes were demonstrated on proximal measures of vocabu¬ 
lary. During the iterative development of Story Friends, the 
curriculum was modified on the basis of an analysis of learn¬ 
ing patterns. Children subsequently were able to define 
more of the targeted words. No significant differences were 
found on the Peabody Picture Vocabulary Test-Fourth 
Edition (PPVT-IV; Dunn & Dunn, 2007), however, as the 
words targeted are not ones assessed in standardized tests 
except for words that show up at higher age levels. 

Proximal measures that tested responses to story ques¬ 
tions targeted in the program showed clear effects, although 
some correct responding at pretesting after hearing the story 
once left limited room for improvement. To assess whether 
children could demonstrate generalized ability to answer 
questions about stories, the Assessment of Story Compre¬ 
hension (ASC; Spencer, Goldstein, Kelley, Sherman, & 
McCune, 2015) was developed. Participants demonstrated 


improvement in their ability to answer inferential questions 
relative to a comparison group (Kelley et al, 2015). 

In these prior studies, learning in the Story Eriends 
condition was compared to that in business-as-usual con¬ 
ditions. To determine whether the embedded lessons were 
responsible for these robust effects, it was necessary to test 
the effects of hearing the words in the stories without embed¬ 
ded lessons. Thus, Kelley et al. (2015) included untaught 
words in their single-case experimental designs and tested 
children on these words contained in the stories that the 
explicit instruction did not target. Learning of untaught 
words through this exposure alone rarely resulted in learn¬ 
ing. To confirm this finding in the current randomized con¬ 
trol trial, a storybook condition that provides exposure 
to words and stories in the same small-group format offers 
a more appropriate comparison condition than a business- 
as-usual control. This has the potential of elucidating the 
role of our hypothesized active ingredient (explicit, embedded 
instruction) in the intervention. Another benefit of an 
exposure-only comparison is that it makes the vocabulary 
outcome measures “fair” to the comparison group; children 
in the comparison condition are exposed to the words on 
the measures. This would not be the case with a business- 
as-usual control group, as children would not have opportu¬ 
nities to learn the tested words in their Tier 1 experiences. 

In previous evaluations of Story Eriends (Greenwood 
et al, in press; Kelley et al, 2015), research staff was re¬ 
sponsible for facilitating the listening center. Eidelity of 
implementation was high, even when less-experienced under¬ 
graduate students served as facilitators. Pilot studies of 
educator-led Story Eriends also indicated that the program 
had feasibility of high-fidelity implementation; paraprofes- 
sionals were able to manage the listening center procedures 
with minimal training. In the current study, educational staff 
in the classroom implemented Story Eriends, a critical next 
step in examining the feasibility of the program. 

The purpose of this study was to determine whether 
embedded lessons (experimental group) produced more learn¬ 
ing of vocabulary and comprehension skills than listening 
to the same stories without embedded lessons (comparison 
group) when implemented in the context of small-group 
listening centers in preschool classrooms by educational 
staff. We hypothesized that the Story Eriends curriculum 
would produce significantly better outcomes on measures 
of vocabulary learning and question answering than simply 
listening to the stories. Given previous research on the in¬ 
fluence of fidelity and preintervention language skills on 
treatment effects (e.g., Penno, Wilkinson, & Moore, 2002; 
Zucker et al., 2013), moderator effects of these variables 
were examined. Higher doses and preintervention language 
skills were expected to predict improved learning. The fol¬ 
lowing research questions were addressed: 

1. Controlling for children’s language pretest scores, what 
are the effects of the Story Eriends program on proximal 
measures of vocabulary and comprehension learning? 

2. What are the effects of the Story Eriends program on 
distal language measures? 
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3. Are observed treatment effects in the experimental 

group moderated by the number of instructional 

book listens attained or by the children’s pretest skills 

on the Clinical Evaluation of Language Fundamentals 

(CELF-P; Wiig, Secord, & Semel, 2004)? 

Method 

Experimental Design and Power Estimate 

A cluster randomized design with children nested in 
classrooms was used to compare the effects of Story Friends 
with and without embedded lessons on measures of vocab¬ 
ulary learning and comprehension (Rhoads, 2011). Power 
estimates were computed to determine the number of class¬ 
rooms needed to detect treatment effects with power of .80. 
The alpha level was set at .05 and intracluster correlations 
(ICCs) were set at .20 on the basis of a review of education 
intervention studies (Hedges & Hedberg, 2007). A child-level 
covariate was included in the power analyses to capture 
probable correlations between child-level variables (e.g., pre¬ 
treatment performance on language measures such as the 
CELF-P) and treatment effects. The child-level covariate 
was set at = .52 on the basis of previous research with a 
similar sample (Greenwood et ah, in press). The power 
estimates indicated that a sufficiently powered study would 
include 30 classrooms with four to six children per class¬ 
room. Attrition at the child level was accounted for with the 
addition of 15% more children, but attrition was expected to 
be unlikely at the classroom level. 

Participants 

The study was conducted in public school pre-K class¬ 
rooms in two urban sites: Columbus, Ohio and Kansas City, 
Kansas. There were 24 classrooms in Ohio and eight class¬ 
rooms in Kansas. Classrooms served primarily students 
from families with low incomes. The Ohio programs were full 
day 5 days per week, and the Kansas programs were half 
day 4 days per week. At each site, classrooms were randomly 
assigned to the experimental and comparison conditions. 
The flow of participants through the experiment is shown in 
Figure 1. 

To identify participants at increased risk and likely to 
benefit from Tier 2 intervention, we considered information 
from the Individual Growth and Development Indicators 
(IGDI) Oral Language screening measures (Picture Naming 
and Which One Doesn’t Belong; Bradfield et al., 2014) 
and a standardized norm-referenced measure (PPVT-IV). 
The goal was to identify six participants per classroom. All 
eligible children completed the two IGDIs. On the Picture 
Naming IGDI, children were asked to verbally label a set 
of 15 pictures. On the Which One Doesn’t Belong IGDI, 
children were presented with a set of three pictures (e.g., 
shirt, pants, and boat) and asked to point to the picture that 
did not belong. A cutoff score provided by the IGDI devel¬ 
opers was applied; all children with scores above the cutoff 
score on both IGDIs were excluded as Tier 2 candidates. 
Children with scores below the cutoff score on one or both 


Figure 1. CONSORT diagram showing the flow of participants 
throughout the experiment. 



IGDIs were given the PPVT-IV. The first six children with 
standard scores between 1.5 and 1.0 SDs below the mean 
(standard score = 78-85) were included. In classrooms 
in which fewer than six children had scores between 78 and 
85, we extended the range up and down to include sufficient 
participants (standard score range = 71-96). 

A total of 163 students were enrolled in the study, 
with an average of five students per classroom. As can be 
seen in Table 1, statistical tests revealed no significant differ¬ 
ences between the groups on demographic, developmental, 
or attrition variables. Children averaged 58 months of age 
at pretest. Standard scores averaged 83.9 and 84.2 at pretest 
for the PPVT-IV and 83.1 and 80.3 for the CFLF-P for the 
experimental and comparison groups, respectively. The per¬ 
centages of children with individualized education pro¬ 
grams averaged 2.5% and 5.1% of the samples, and children 
who were English language learners averaged 18.8% and 
17.9% of the samples for the experimental and comparison 
groups, respectively. Child attrition totaled 6% over the 
course of planned data collection. 

Setting and Procedures 

In all classrooms, children participated in small-group 
(approximately three children) listening centers. The group¬ 
ing of children was flexible and determined by the teacher. 
Classroom staff, a teacher, or a teacher’s aide was responsible 
for facilitating the listening center (e.g., helping children keep 
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Table 1. Sample characteristics by experimentai groups and sites. 


Group Statistic 


Variable 

Site 

Experimental 

Comparison 

Estimate 

df 

P 

Children (n) 

Ohio 

60 

54 





Kansas 

25 

24 





All 

85 

78 




Classrooms (n) 

Ohio 

12 

12 





Kansas 

4 

4 





All 

16 

16 




Age at start (months), M {SD) 

Ohio 

57.4 (3.38) 

57.8 (3.44) 

f = 0.56 

112 

.575 


Kansas 

57.5 (3.03) 

58.9 (3.78) 

f = 1.44 

47 

.156 


All 

57.4 (3.26) 

58.1 (3.56) 

f = 1.28 

161 

.203 

Individualized education program status (% yes) 

Ohio 

1.7 

3.70 

= 0.46 

1 

.498 


Kansas 

5.3 

8.30 

= 0-15 

1 

.695 


All 

2.5 

5.10 

X^ = 0.72 

1 

.396 

English language learner status (% non-English) 

Kansas 

48.0 

41.70 

X^ = 0.20 

1 

.656 


All 

18.8 

17.90 

X^ = 0.02 

1 

.886 

CELF-P pretest, M (SD) 

All 

83.10 (11.07) 

80.32 (12.56) 

f = 1.48 

153 

.147 

CELF-P posttest, M (SD) 

All 

88.94 (10.12) 

87.64 (11.62) 

f = 0.74 

153 

.461 

PPVT-IV pretest, M (SD) 

All 

83.90 (5.32) 

84.18 (5.70) 

f = 0.32 

153 

.748 

PPVT-IV posttest, M (SD) 

Sample size by unit, M (SD) 

All 

93.29 (8.31) 

91.83 (9.22) 

f = 1.04 

153 

.301 

1 


84 

77 




2 


79 

76 




3 


79 

76 




4 


79 

76 




5 


79 

74 




6 

Omnibus 


55 

50 

Z^ = 0.12 

5 

.999 


Note. Standard deviation given in parentheses. CELF-P = Ciinicai Evaiuation of Language Fundamentais-Preschooi; PPVT-iV = Peabody 
Picture Vocabuiary Test-Fourth Edition. 


headphones on or turn pages). Children followed along in 
storybooks, listened to accompanying prerecorded audio, 
and responded to questions and prompts to say words and 
definitions. Research staff supported teacher implementation 
by providing materials, assisting with scheduling, and trouble¬ 
shooting equipment and monitored fidelity. Research staff 
was responsible for the administration and scoring of child 
assessments. 

The Story Friends curriculum included two book se¬ 
ries {Forest Friends and Jungle Friends). Each book series 
had an introductory book designed to familiarize participants 
with characters and procedures, nine instructional books, 
and three review books. The nine instructional books in 
each series were divided into three units of three instructional 
books plus a review book. For the introductory and in¬ 
structional books, children listened to each book three times 
in a week. Due to constraints of scheduling assessments, 
repeated readings of the review book were not possible. 

On those weeks, children received one session. Appendix 2 
provides an overview of the timeline for intervention and 
assessment procedures. 

In the experimental condition, the storybooks and 
prerecorded audio included embedded lessons on challenging 
vocabulary words and story questions (cf. Spencer, Goldstein, 
Sherman, et al, 2012). Each instructional book included 
two embedded lessons for each of two challenging vocabu¬ 
lary words (e.g., enormous, brave) and one embedded lesson 
for each of three inferential story questions (e.g., “Why was 


Leo sad?”). The review book included brief lessons for each 
of the six vocabulary words included in the three instructional 
books in that unit. The prerecorded storybooks were 9 to 
12 min in length. Because children heard storybooks three 
times per week, they received 30 to 40 min of additional in¬ 
struction per week. Introductory books did not include instruc¬ 
tion, and children listened to review books just once. Thus, 
across the six units, participants received about 10 hr total of 
instruction. In Kansas, participants completed five of the 
six units due to school year scheduling; the sixth unit was 
completed in Ohio only. Across the three listens to the instruc¬ 
tional books and the single listen to the review books, children 
received seven embedded lessons for each challenging vo¬ 
cabulary word and three lessons for each story question. 

In the comparison condition, participants were exposed 
to the same stories containing the same vocabulary words, 
but with no embedded lessons. Participants followed the 
same procedures (e.g., repeated readings of the instructional 
books), but the length of the storybooks was slightly shorter 
without the embedded lessons. 

Outcome Measures 

Researcher-developed and standardized measures were 
administered to identify participants, measure proximal 
and distal outcomes, monitor fidelity of implementation, 
and assess teacher satisfaction. Learning of instructional 
targets was assessed using two researcher-developed measures: 
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the Unit Vocabulary Test (UVT) and the ASC. Trained 
research staff administered the UVT and the ASC individu¬ 
ally to participants. 

The UVT was administered prior to and immediately 
following each unit of instruction (i.e., three instructional 
books and one review book) and tested each of the six vo¬ 
cabulary targets for the unit. A definitional task was selected 
for the UVT to provide a rigorous measure of vocabulary 
learning. Multiple-choice or picture-pointing tasks could be 
correct by chance, whereas a definitional task had no chance 
component. The definitional task also allowed for incre¬ 
mental scoring to be sensitive to partial word knowledge. 

For each item on the UVT, participants were asked 
an open-ended question: “What does [target word] mean?” 
If participants did not provide a correct definition, a stan¬ 
dard cloze prompt (“[target word] means ...”) was delivered 
one time. Participant responses were transcribed in real 
time; audio recordings were collected for reliability purposes. 
Trained research assistants scored the UVT using detailed 
scoring guides. Each response could receive 2, 1, or 0 points. 
A 2-point response was the taught definition, a synonym, 
or another complete response. A 1-point response was a 
response that indicated some knowledge of the word, such 
as an example from the story (e.g., “An elephant is enor¬ 
mous”). A 0-point response was an unrelated response, an 
“I don’t know” response, or no response. Scoring guides 
included multiple sample answers for each scoring category 
for each item. 

The ASC served as an outcome measure for the com¬ 
prehension domain. The ASC is a curriculum-based measure 
designed to assess children’s ability to answer questions 
about stories. The ASC has evidence of good psychometric 
properties, including high internal consistency (Cronbach’s 
a = .96), test-retest reliability (r = .82), and strong correla¬ 
tions with measures of oral language (e.g., r = .81 with 
CELF-P), making it an appropriate tool for this investigation. 

Trained research staff administered the ASC indi¬ 
vidually, and participants completed one ASC story at each 
of seven time points (preintervention and after each unit). 
The ASC includes nine parallel forms; a standard sequence 
of forms was used for all participants. To administer the 
ASC, the examiner reads a short story and asks the child to 
respond to a series of questions about the story. Questions 
include three literal questions about key story events (e.g., 
“What was Danny doing in this story?”); a question about 
a vocabulary word in the story (e.g., “What does injured 
mean?”); and four inferential questions about character 
actions (e.g., “Why did Danny’s mom give him a hug?”), 
character emotions (e.g., “Why was Danny sad?”), or post¬ 
story predictions (e.g., “Do you think Danny will ride his 
bike down the big hill again?”). The inferential question 
types were similar to those targeted in Story Friends. The 
ASC is scored on a scale of 1 to 17. Literal and inferential 
questions were scored on a 3-point scale. A score of 2 was 
awarded for a complete and clear response (e.g., “He will get 
hurt”), a score of 1 was awarded for a correct response that 
was incomplete or unclear (e.g., “Fall down”), and a score 
of 0 was awarded for an incorrect or unrelated response 


(e.g., “Watch TV”). The vocabulary item was scored on a 
scale of 0 to 3. 

The UVT and ASC served as proximal measures of 
student learning; two standardized, normative language 
measures—the PPVT-IV and the CELF-P—were adminis¬ 
tered at pretest and posttest to measure distal outcomes. 
Information from the PPVT-IV was also considered for 
participant identification. Participants completed three sub¬ 
scales of the CELF-P (Sentence Structure, Word Structure, 
and Expressive Vocabulary) to obtain a Core Language 
score, which has a normative mean of 100 and an SD 
of 15. 

Dosage 

The attendance of children in both groups at all sched¬ 
uled sessions was monitored as an indicator of exposure to 
each curriculum condition. The intended dosage was 10 lis¬ 
tens per unit (three listens each to three instructional books 
and one listen to the review book). Reasons for missing 
these opportunities to learn included absences from school 
for sickness or other excused reasons and cancellations due 
to school schedule changes and winter weather. Make-up 
sessions were arranged whenever possible. On rare occasions, 
a child missed an entire unit due to an extended absence. 
Overall, dosage was high in both groups and at both sites. 
In Ohio, the experimental group received an average of 
8.79 listens per unit {SD = 1.56, range = 0-10), and the com¬ 
parison group participated in an average of 8.49 listens 
per unit (SD = 1.90, range = 1-10). Means were similar in 
Kansas: 8.83 listens per unit (SD = 1.67, range = 0-10) for 
the experimental group and 8.83 listens per unit (SD = 1.43, 
range = 5-10) for the comparison group. Across sites, the 
number of listens declined slightly in the experimental group 
at a rate of -0.13 listens per unit. There was no decline in 
the number of listens in the comparison group. 

Implementation Fidelity 

Research staff assessed fidelity of intervention using 
checklists that included six items reflecting critical com¬ 
ponents of the intervention (i.e., each child had headphones, 
each child had a book, facilitator had headphones, correct 
audio was playing and functioning properly, entire audio 
was played, listening center was quiet with few distractions). 
Observations were conducted every 1 to 2 weeks. Research 
staff observed each listening center (e.g., two listening center 
observations per classroom per visit). The average number 
of observations per classrooms was 18 for treatment class¬ 
rooms and 10 for comparison classrooms. 

Rigorous standards were set for the fidelity of imple¬ 
mentation of the automated interventions (at least 90%). 
Overall fidelity of implementation was quite high: 94.9% 
(range = 33%-100%) across all classrooms. In treatment 
classrooms, fidelity of implementation was 93.9% (33%- 
100%). In comparison classrooms, fidelity of implementation 
was 96.6% (67%-100%). Instances of low fidelity were rare; 
most often these were isolated incidents in which the listening 


6 Journal of Speech, Language, and Hearing Research •1-17 

Downloaded From: http://jslhr.pubs.asha.org/ by a ReadCube User on 05/02/2016 
Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx 


center was interrupted (e.g., fire drill, other change in sched¬ 
ule). Across all observations, the item on the fidelity checklist 
that was most often missing was “listening center was quiet 
with few distractions”; only 84% of observations indicated 
that this was the case. 

Research staff completed training and “check-out” 
procedures for all measures prior to administration. Audio 
recordings of assessment sessions were used to conduct 
fidelity checks for administration of the measures through¬ 
out the study (30% of measures at each unit of assessment). 
To examine fidelity of test administration, trained research 
assistants listened to audio recordings and completed proce¬ 
dural checklists specific to each measure (e.g., assessor deliv¬ 
ered items exactly as written, provided standard prompts). 
The criterion for fidelity of assessment was 85% of items on 
the checklist. All observed sessions (374 observations across 
sites) exceeded this criterion; overall fidelity was very high. 
For the UVT, average fidelity of administration was 99.97% 
(range = 93%-100%). For the ASC, average fidelity of 
administration was 100% (range = 85%-100%). 

Scoring Reliability 

A primary scorer, a trained member of the research 
team, scored all measures. At each site, a second research 
team member scored approximately one third of all the 
assessments for the purpose of evaluating scoring reliability. 
Detailed scoring guides were created to ensure reliable 
scoring. For both the UVT and the ASC, scoring rubrics 
and sample responses were created for each item. For 
the UVT, scorers were blinded to condition (treatment, 
comparison) and to the unit of assessment (pre-post). For 
the ASC, scorers were blinded to condition. Because the 
ASC forms were administered in a standard order, it was 
not possible for scorers to be blind to unit of assessment. 

On both measures, an item-by-item comparison was 
made to determine agreement or disagreement. Scoring 
reliability was calculated by dividing the total number of 
agreements by the total number of agreements plus disagree¬ 
ments and multiplying by 100. Kappa coefficients also 
were calculated. 

For the UVT, the high number of 0-point answers 
(e.g., “I don’t know” or no response) at pretest made scoring 
agreement more likely. Thus, we examined agreement sepa¬ 
rately for pretest and posttest. Mean agreement was 98% 
at pretest (k = .84) and 97% at posttest (k = .91). On the 
ASC, mean agreement was 93% (k = .88) across all units of 
assessment. Cross-site scoring reliability was examined for 
10% of all assessments samples across units of assessment. 
Mean agreement was 99% (k = .99) on the UVT and 93% 

(k = .88) on the ASC. 

Social Validity Assessment 

Following intervention, teachers and teacher’s aides 
were asked to complete anonymous consumer satisfaction 
surveys (approximately 93% return rate). For the purposes 
of evaluating Story Friends, only the survey results from 


the teachers (n = 20) and teachers’ aides (« = 12) in the 
treatment condition who completed the survey were exam¬ 
ined. The survey consisted of 17 items related to satisfaction 
with the intervention (e.g., “The listening centers are worth 
the time and effort and I would implement them again some¬ 
time in the future”), ease of implementation (e.g., “I could 
fit the listening center into my classroom schedule three times 
each week without much difficulty”), and participation in 
the research study (e.g., “I had the support I needed from 
the research team to conduct the listening center”). Teachers 
responded on a scale of 1 {strongly disagree) to 6 {strongly 
agree). The survey included several open-ended questions 
(e.g., “If you could change one thing about the intervention, 
what would you change?”). 

Results 

Data Analysis 

Mixed regression modeling was used to answer 
research question 1 regarding the observed effects on 
vocabulary and comprehension learning. Estimates of the 
between-children and between-classrooms variation are 
included in the models to account for the clustering of 
multiple observations across time within each child and 
across multiple children within each classroom (Snijders & 
Bosker, 2012). To control for children’s pre-existing lan¬ 
guage differences at start, pretest scores on the CELF-P 
and each instructional unit were included in the model pre¬ 
dicting UVT and ASC scores. Next, differences by treat¬ 
ment group (experimental vs. comparison) were tested in 
the model first for the main effect of entire series (grand 
mean collapsed over units) and second for the effects across 
the six instructional units, forming a multilevel growth model. 
Growth over time was indicated by the interaction of group 
(two) by unit (six). 

In these models, estimates for first-unit score (i.e., 
intercept) and linear growth across units were allowed to 
vary across children. To account for the nesting of children 
within classrooms, intercept and growth estimates were 
allowed to vary across classrooms such that modeling 
accounted for classroom-level differences in first-unit scores 
and growth not explained by control and focal predictors. 
Because these models did not assume a particular data struc¬ 
ture for the number of data points and because prior data 
for the Kansas children were included, the analyses were not 
influenced by the missing Kansas data at unit 6. 

In addition to model estimates and p values for each 
predictor, effect size estimates were calculated. Cohen’s 
(Cohen, 1988) was used as an appropriate local effect size 
for representing each variable’s impact in the multivariate 
mixed-effects regression models; Cohen’s/^ values of .02, 
.15, and .35 reflect small, medium, and large effects, respec¬ 
tively. Derivation of the marginal semipartial values 
necessary for computation of Cohen’s/^ was calculated 
after Nakagawa and Schielzeth (2013). 

Because the raw data distribution of UVT for vocab¬ 
ulary was positively skewed due to a large numbers of 
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zeroes and a long positive tail, Poisson regression was used 
to fit the distribution (Agresti, 2007). Poisson regressions 
with log links between the predictors and the dependent 
variable were used at the core of the conducted multi¬ 
level growth modeling. The distribution of ASC compre¬ 
hension scores was generally normal, with no substantial 
skewness (0.318) or kurtosis (-0.555) present; therefore, 
general linear mixed modeling was appropriate for modeling 
comprehension. 

To answer research question 2 regarding the effects 
on posttest gains in PPVT-IV and CELF-P scores, linear 
mixed models were used. Because each child had one post¬ 
test score for each measure, only nesting of children within 
classrooms was accounted for by the inclusion of a ran¬ 
dom intercept. Inclusion of PPVT-IV and CELF-P pretest 
scores as predictors was used to sort out associated vari¬ 
ance, allowing group to uniquely predict gains from pretest 
to posttest on the outcomes. Research question 3 inves¬ 
tigated moderation of the growth within the experiment 
depending on the child’s starting language skill (CELF-P) 
and dosage of exposure to the intervention (total number 
of listens). The CELF-P x Unit x Number of Instructional 
Book Listens interaction was examined as a predictor of UVT 
and ASC scores. 

Effects on Vocabulary Learning 

Vocabulary learning is reported in word points per 
unit, with a maximum of 12 reflecting complete knowledge 
of all six novel words per unit. As can be seen in Table 2, 
controlling for children’s initial differences in pretest CELF-P 
language and unit pretest vocabulary knowledge, the main 
effect of group on children’s vocabulary learning was 
large and statistically significant (group p = 1.58,/; < .001, 
Cohen’s/^ = .70). The experimental group grew from a 
pretest mean of 0.60 word point {SD = 0.25) to a posttest 
mean of 4.00 {SD = 1.45), a mean gain of 3.40 word points 
per unit. In contrast, the comparison group grew from a 


pretest mean of 0.50 word point {SD = 0.18) to a posttest 
mean of only 0.80 word point {SD = 0.36), a mean gain 
of only a fraction of a word point (0.32). Learning in the 
experimental group was always significantly greater than 
that in the comparison group, ranging from 7.2 times greater 
at the first unit (P = 2.106, exp(P) = 8.218, SE = 0.146, 
p < .001) to 1.9 times greater at the last unit (p = 1.048, 
exp(P) = 2.853, SE = 0.184, p < .001). Effect sizes were 
medium on the basis of Cohen’s/^ value of .17 for the vari¬ 
ance explained by the interaction between group and unit 
and for the main effect of group alone (/^ = .23). In contrast, 
the comparison group’s gains were negligible and non¬ 
significant (i.e., a rate of 0% per unit)—p = 0.016, exp(P) = 
1.016, 5E = 0.036, p = .67. 

In Figure 2, low pretest scores for each unit are evident 
for both the experimental and comparison groups. Although 
equally low post-test scores are evident for the comparison 
group, the experimental group demonstrated clear improve¬ 
ments in word learning for each unit, averaging 3.4 word 
points. Because new words were introduced at each unit, 
means for each unit were not expected to grow. However, 
the interaction of experimental group by curriculum unit over 
time was statistically significant (Group x Unit p = -0.210, 
SE = 0.042, p < .001), reflecting an unexpected declining 
trend over time. ICCs in these analyses for children were 
ICC = .103 and for classroom were ICC = .066, indicating 
that differences between children explained 10.3% of the 
variance in vocabulary posttest scores, whereas differences 
between classrooms explained 6.6% of the variance in vocab¬ 
ulary posttest scores. 

Effects on Comprehension Learning 

After controlling for initial differences, there were no 
significant findings for the effects of groups or the Group x 
Unit interaction on the ASC total score. Children in the 
experimental group achieved a mean of 8.3 {SD = 4.0) at 
the last assessment versus 7.2 {SD = 3.9) in the comparison 


Table 2. Mixed Poisson regression resuits for vocabuiary iearning and mixed iinear regression resuits for comprehension 


Vocabulary learning Comprehension 


Model/fixed effects 

P(SE) 

P 

Effect size 

Increase^ (%) 

P(SE) 

P 

Effect size 

Averaged across units 








Intercept 

-2.702 (0.422) 

<.001 



-4.655 (1.205) 

<.001 


Pretest score 

0.105(0.021) 

<.001 

.022 

11.09 




CELF-P scale score 

0.027 (0.005) 

<.001 

.116 

2.69 

0.130 (0.014) 

<.001 

0.206 

Group*' 

1.580(0.116) 

<.001 

.698 

385.46 

-0.230 (0.337) 

.495 

0.001 

Growth across units 








Intercept 

-2.822 (0.415) 

<.001 



-6.274 (0.147) 

<.001 


Pretest score 

0.136(0.022) 

<.001 

.020 

14.60 




CELF-P scale score 

0.027 (0.005) 

<.001 

.130 

2.75 

0.147 (0.013) 

<.001 

0.316 

Group*' 

2.023(0.135) 

<.001 

.800 

655.90 

-0.071 (0.329) 

.829 

0.001 

Unit" 

0.013(0.036) 

.731 

.046 

1.26 

0.109(0.075) 

.149 

0.005 

Group*' X unit** 

-0.210(0.042) 

<.001 

.168 

-18.94 

-0.070 (0.108) 

.514 

0.002 


Note. Effect size is given as Cohen’s CELF-P = Ciinicai Evaiuation of Language Fundamentais-Preschooi. 

^Represents the percentage increase in word points learned per unit increase in the independent variable. '^0 = comparison, 1 = treatment. 
“Centered at unit 1. 
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Figure 2. Pretest and posttest vocabulary performance for 
experimental and comparison groups. 
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group—improvements of 2.2 and 1.5 points, respectively. 
Likewise, no significant differences between groups were 
seen at each unit (see Figure 3). However, a main effect of 
site (P = 1.818, SE = 0.428,< .001) was noted, indicating 
that children in Ohio averaged 1.8 higher on the ASC Total 
Score compared with children in Kansas. ICCs were cal¬ 
culated for children and for classroom levels (child ICC = 
.214, class ICC = .207). Differences between children explained 
21.4% of the variance in comprehension posttest scores, 
whereas differences between classrooms explained 20.7% of 
the variance in comprehension posttest scores. 


Figure 3. Performance on the Assessment of Story Comprehension at 
pretest and after each unit for experimental and comparison groups. 
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Effects on Distal Language Measures 

After accounting for initial differences in pretest scores, 
neither norm-referenced language measure (PPVT-IV, 
CELF-P) was influenced by group. However, both groups 
made gains in standard scores of approximately 0.5 SD 
(see Table 1). 

Moderation in the Experimental Group 

Because intervention effects on word points were sig¬ 
nificantly greater for the experimental group, moderation 
was explored. Analyses indicated that instructional book 
listens were not significant as a main effect. Their inter¬ 
action with instructional unit and CELF-P was significant 
(p = .03) but represented a small effect size (Cohen’s/^ = 
.003). Children with higher CELF-P scores benefited 
most from additional book listens, but this positive effect 
of additional book listens in early units dropped over time 
after unit 2. 

Social Validity Results 

In general, teachers were satisfied with the inter¬ 
vention (M = 5.05 and SD = 1.15 on items related to satis¬ 
faction). Teachers agreed that children benefited from 
participating in the listening centers (M = 5.25, SD = 0.92) 
and that children enjoyed the listening center activities 
(M= 5.06, SD = 1.06). However, seven teachers disagreed 
with the statement that children enjoyed listening to the 
same book three times (M = 4.26, SD = 1.41). Teachers 
agreed that it was feasible to have an adult at the listening 
center (M = 5.16, SD = 0.90) and somewhat agreed that 
it was feasible to integrate the intervention into the daily 
classroom schedule and weekly routine (M = 4.84, SD = 1.21). 
Responses to the open-ended questions reflected similar 
sentiments. Many teachers commented that students bene¬ 
fited from the program and gave examples of vocabulary 
learning and gains in language skills (e.g., answering ques¬ 
tions, talking more, using complete sentences). Teachers 
also commented on changes in student behavior related to 
paying attention and following directions. Teachers frequently 
expressed a desire to include all children in the classroom 
rather than just the identified participants. Although com¬ 
ments were generally quite positive, several teachers iden¬ 
tified challenges to implementation, finding it difficult to 
keep children engaged (e.g., children wanted to participate 
in a different center) or to make time for the intervention 
(e.g., suggesting making the listening center sessions shorter 
or less frequent). 

Discussion 

The purpose of this study was to examine the effects 
of a Tier 2 curriculum implemented in preschool classrooms 
by educational staff on the vocabulary and comprehension 
learning of preschool children with limited oral language 
skills. We sought to determine whether stories with embed¬ 
ded lessons produced more learning of vocabulary and 
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comprehension skills compared with the same stories with¬ 
out embedded lessons. We also examined intervention ef¬ 
fects on distal measures of oral language skill and evaluated 
the contributions of dosage and preintervention language 
skills on treatment effects. 

Results revealed that when storybooks included em¬ 
bedded lessons, vocabulary learning consistently exceeded 
the exposure comparison condition. The effect size asso¬ 
ciated with the differences between treatment and com¬ 
parison groups was large. Participants in the treatment 
classrooms gained an average of 3.4 word points for each 
unit. Performance varied substantially among participants, 
with three children gaining only 1 word point and a high 
performer gaining 54 word points during the year. Partici¬ 
pants in the comparison classrooms demonstrated almost 
no gains, averaging less than 2 word points over the course 
of the year. The virtual absence of learning of the same 
vocabulary words that were included in the story by the 
comparison group without the embedded lessons indicates 
that storybook listening alone has little effect on the learning 
of challenging words. This was the case with untaught 
words in our prior single-subject design experiments (Kelley 
& Goldstein, 2014; Kelley et al, 2015; Spencer, Goldstein, 
Sherman, et ah, 2012) and was consistent with previous 
research (cf. Beck & McKeown, 2007; Justice, Meier, & 
Walpole, 2005); children appear to learn very little about 
new vocabulary words from simply hearing stories read 
aloud. The facilitative effects of storybook reading seem 
to accrue from instructional lessons, even when the in¬ 
teractions are with an audio-recorded narrator. Perhaps 
effects would be as strong or stronger if parents and teachers 
provided similar instruction and opportunities for children 
to respond. But that would require substantial training and 
preparation on their part and may sacrifice implementa¬ 
tion fidelity. 

One unexpected finding was the decline in vocabulary 
learning over the course of the academic year. The first-unit 
performance was surprisingly high and the final-unit per¬ 
formance was surprisingly low, which was largely responsi¬ 
ble for the downward trend in vocabulary learning. The 
high vocabulary learning in the first unit may reflect a nov¬ 
elty effect. This was unexpected, however, as our prior 
research showed that some children seem to gain confidence 
in their ability to define challenging words over time. In¬ 
deed, a future line of research may seek to determine how 
many weeks of limited progress in small-group Tier 2 inter¬ 
vention provides an indication of the need for additional 
(Tier 3) support. Some variability among units may be due 
to competing activities that preoccupy children or teachers 
around holidays and the end of the school year. Although 
consistent word-selection criteria were applied across the 
stories, differences in word difficulty may have contributed 
to variability among units. Factors such as familiarity 
(Gray & Brinkley, 2011), word type (Alt & Plante, 2006), 
phonotactic probability, and neighborhood density (Storkel 
& Hoover, 2011) have been shown to contribute to differ¬ 
ences in word learning success in experimental tasks; we 
did not control for or examine these variables. Likewise, 


systematic instructional language may contribute to differ¬ 
ences in lesson effectiveness. For example, some lessons 
likely had examples that were more interesting or meaning¬ 
ful to children than others. Book-level differences could 
be examined by counterbalancing the order of books in sub¬ 
sequent studies. 

O’Donnell (2008) has outlined treatment variables 
that could explain variability in responding. Some of these 
variables (e.g., duration and dosage) were measured and 
examined, but others are more difficult to capture (e.g., 
engagement and responsiveness). Waning enthusiasm, mon¬ 
itoring, or positive feedback by teachers or decreased 
interest or engagement by the children represent possible 
explanations. Baker, Kupersmidt, Voegler-Lee, Arnold, 
and Willoughby (2010) found that implementation of a new 
intervention was predicted by teachers’ perceptions of the 
program and by the support provided by the center. If 
an end-of-the-year decrease in learning is seen regularly in 
early childhood settings, these variables may be important 
to evaluate in future studies. 

The effects of comprehension intervention did not 
differ between the two groups; only a slight upward trajec¬ 
tory was demonstrated. This was the one variable that 
showed an effect of site, with children in Ohio averaging al¬ 
most 2 points higher than children in Kansas on the ASC. 
The lack of significant effects of intervention on compre¬ 
hension was unexpected because prior research had shown 
significant effects for the inferential questions on the ASC 
(Kelley et al., 2015). We suspect that in Story Friends there 
are too few teaching trials and opportunities to respond. 
Van Kleeck et al. (2006) reported strong effects on literal 
and inferential language from an intensive intervention 
delivered individually. It appears that the “think aloud” in¬ 
structional approach did not produce robust effects in the 
automated format, perhaps because the prerecorded narra¬ 
tion does not provide contingent feedback. 

Consistent with the recommendations of the National 
Reading Panel (2000), multiple measures of outcomes in 
the study included both specific, researcher-created mea¬ 
sures (e.g., UVT) and standardized, norm-referenced mea¬ 
sures of general language abilities (PPVT-IV and CELF-P). 
Although treatment effects were evident on the researcher- 
created measures, differential group effects were not seen for 
the PPVT-IV or the CELF-P. It may not be surprising that 
these measures were not sensitive to the effects of Story 
Friends, as there is no overlap in the words taught and 
those tested on the PPVT-IV and CELF-P. Indeed, words 
targeted in Story Friends would likely show up as items for 
older students in the PPVT-IV. Two recent meta-analyses 
of vocabulary intervention studies have reported significant 
effects on standardized measures. Marulis and Neuman 
(2010) found that large effects were evident on researcher- 
created measures of specific vocabulary taught (g = 1.21), 
whereas more moderate effects were obseiwed on standardized 
measures (g = 0.71). In a meta-analysis of the effects of 
vocabulary instruction on comprehension that included older 
children, Elleman, Lindo, Morphy, and Compton (2009) 
reported an effect size of .29 for standardized measures of 
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vocabulary. A careful analysis of the words taught in inter¬ 
vention studies is warranted to help us understand the 
potential relation between targeted vocabulary and vocabu¬ 
lary assessed on various standardized measures. 

It is possible that a relatively low-intensity interven¬ 
tion such as Story Friends (approximately 10 hr total across 
the school year) is not sufficient to produce changes in 
global language abilities. Many studies of targeted vocabu¬ 
lary intervention for young children have not included mea¬ 
sures of generalized outcomes (Beck & McKeown, 2007; 
Coyne et al, 2007; Loftus et al, 2010; Silverman, 2007). 
Those that have examined treatment effects on distal outcome 
measures often have failed to find effects (Coyne et al., 
2010; Leung, 2008). For example, Neuman et al. (2011) 
found no group differences on the Woodcock-Johnson Pic¬ 
ture Vocabulary Subtest after a supplemental vocabulary 
program implemented across the school year. In some stud¬ 
ies, intervention programs that have targeted more global 
language skills (e.g., dialogic reading) have produced effects 
on distal measures (Hargrave & Senechal, 2000; Lonigan 
& Whitehurst, 1998); in other studies, no significant effects 
were reported (Lonigan, Anthony, Bloomfield, Dyer, & 
Samwel, 1999). Efforts devoted to changing instructional 
practice (e.g., professional development, collaborative models) 
and environments (e.g., literacy access) have sometimes 
resulted in improvements on distal language measures 
(Hadley, Simmerman, Long, & Luna, 2000; Neuman, 1999; 
Wasik, Bond, & Hindman, 2006). It appears that improv¬ 
ing language abilities of young children to the extent that 
effects can be observed on generalized language measures is 
quite challenging; it is likely that intensive, sustained instruc¬ 
tion is necessary. 

Although no significant group differences on distal 
measures were observed, both groups showed improve¬ 
ment during the school year. Both groups’ scores averaged 
more than 1 SD below the mean at pretest. On the PPVT-IV, 
the experimental standard score group mean increased 
by about 9.39 points and the comparison increased by 
7.65 points. On the CELF-P, the experimental standard 
score group mean increased by 5.84 points and the compar¬ 
ison group increased by 7.32 points. It is likely that gains 
on these generalized measures are an indication of the con¬ 
tributions of general. Tier 1 pre-K instruction to language 
development. Greenwood et al. (2013) reported similar 
gains as the result of Tier 1 instruction across a variety of 
preschool settings for children at the Tier 2 performance 
level (5.1 points on the PPVT-IV; 7.4 standard score points 
on the CEEF-P). 

Moderation analyses revealed that treatment effects 
were relatively unaffected by classroom and site differences. 
ASC scores were a little higher in Ohio classrooms. One 
possible explanation is that difference may be due to pre¬ 
intervention language abilities. There were more children in 
Kansas who were English language learners. The ASC may 
have been more taxing for them because it requires some 
expressive language ability to produce responses. 

Pretest scores on normed language tests did not mod¬ 
erate vocabulary learning, nor was there a main effect for 


instructional book listens. This is probably related to the 
small range in the number of listens, as implementation was 
high throughout the study, with only a small decline as the 
study progressed. Although we found a Number of Eistens x 
Instructional Units x CELF-P interaction, this was a small 
effect (Cohen’s/^ = .003). There was a positive effect of 
additional book listens for higher CELF-P scores that was 
evident only in the early units. 

The lack of variation in instructional dosage repre¬ 
sents a strength of this study. The teachers and their aides 
achieved exceptionally high implementation fidelity. Obser¬ 
vations indicated that in the vast majority of sessions, all 
key elements of the intervention procedure were in place. 
Fidelity of 95% was achieved, which is substantially higher 
than levels reported by other research groups (e.g., 38% 
fidelity observed by Dickinson, 2011), and 81% of children 
received eight to nine of the intended 10 listens of books 
per instructional unit. One way in which such high fidelity 
was achieved was by asking educational staff to be re¬ 
sponsible only for procedural fidelity (e.g., ensuring that 
participants listened to the books) rather than for complex in¬ 
structional approaches (e.g., explicit vocabulary instruction). 
This allowed educational staff to devote resources, such as 
professional development and planning time, to other class¬ 
room activities, including Tier 1 instruction. Previous stud¬ 
ies have found that procedural aspects of an intervention 
program can be readily implemented with fidelity (Justice 
et al., 2008; Pence et al, 2008), and our intervention was 
designed with this in mind. 

Other studies have used similar instructional targets 
(e.g., challenging vocabulary), measures (e.g., definitional 
tasks), and populations (e.g., children at risk due to limited 
vocabulary knowledge), but few studies have included all 
these components (Loftus et al., 2010; Pullen et al., 2010). 
In comparison with the interventions examined in these 
studies. Story Friends appears to have strong effects on 
student learning. Participants in the current study learned 
an average of 10.2 words out of a possible 36, representing 
28.33% of words taught. In Loftus et al. (2010), at-risk 
children in the Tier 2 treatment group averaged a one-word 
advantage in definitions over the control group; they de¬ 
fined 12.5% of the eight words targeted after 240 min of 
instruction. Pullen et al. (2010) reported an impressive 
posttest definitional knowledge of on average 3.67 of the 
eight words targeted in Tier 2 (45.9%) after 200 min of 
instruction. Researchers implemented the intervention 
in the Pullen et al. and Loftus et al. studies. 

Studies of teacher-implemented vocabulary interven¬ 
tions typically have examined classroom-wide (Tier 1) cur¬ 
ricula. For example. Beck and McKeown (2007) found 
that participants receiving the Text Talk intervention learned 
16% to 20% of 22 words. Neuman et al. (2011) found im¬ 
provements after implementing the World of Words pro¬ 
gram; pretest scores on a picture-pointing task ranged from 
61% to 78%, indicating that children knew many of the 
targeted vocabulary words prior to intervention. Scores 
increased to 77% to 88% of words known versus 69% to 
79% for the control group. Biemiller and Boote (2006) 
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reported that children learned 10% of word meanings after 
repeated exposure from storybook reading. This percentage 
increased to 22% when words were explained to children 
and to 41% when instruction was intense. A common finding 
in these studies is that children learn only a small percentage 
of the novel words taught. There certainly is room for im¬ 
provement as we devise additional strategies that may aid 
mastery and use measures that may be more sensitive to gains 
in the depth of knowledge acquired. 

This was the first study to examine all six units of 
Story Friends across an entire school year. There was sub¬ 
stantial variability across units, with average learning of 
45% of words taught in the first unit versus 11% of words 
taught in the last unit. In previous studies of Story Friends, 
gains in word learning were larger (56% of words taught 
in Kelley et ak, 2015; 45% of words taught in Spencer, 
Goldstein, Sherman, et ah, 2012). It is possible that the 
population sampled in the current study was somewhat 
different than previous samples. Although means on the 
PPVT-IV were similar (Standard Score = 83-84), the sam¬ 
ple mean on the CELF-P was slightly lower (Standard 
Score = 80 vs. 86 and 89 in Spencer et al. and Kelley et ak, 
respectively). 

Conclusions and Future Directions 

The purpose of the current investigation was to ex¬ 
amine the effects of prerecorded, embedded lessons on the 
learning of vocabulary and story comprehension abilities 
of preschool children. Our findings suggest that this innova¬ 
tive automated approach has potential as an approach to 
provide supplemental vocabulary instruction and can be 
implemented with high fidelity in preschool classrooms. 
The comparison group received exposure to storybooks and 
words in the same automated, small-group context to 
provide a rigorous test of the embedded lessons. An impor¬ 
tant next step would be to compare prerecorded lessons 
with storybooks and lessons delivered “live” by teachers. If 
stronger effects can be achieved by live delivery, we should 
consider ways to support this practice in preschool class¬ 
rooms, continuing to emphasize the need for high-fidelity 
implementation with minimal requirements for educational 
staff. 

There was substantial variability in word learning 
among participants in the experimental group, ranging 
from 2% to 75% of words being learned (averaging 28%). 
We used performance on standardized screening and 
norm-referenced measures to identify participants with lim¬ 
ited oral language as candidates for Tier 2 intervention. 
However, it is clear that the program varied in its effective¬ 
ness for individual participants. Future research should 
examine ways to match children to tiers of intervention. In 
an MTSS approach, infomation about learning in response 
to instruction could be used in combination with scores 
on standardized measures. 

Future research should continue to investigate ways 
to enhance vocabulary learning and retention. It is pos¬ 
sible that we are underestimating the extent of learning by 


using the definition task, as there may be long-term benefits 
even if the child develops partial knowledge of new vo¬ 
cabulary words. Repeated experiences with a word in a 
variety of supportive language contexts are likely necessary 
for the development of deep, long-term word knowledge. 
The vocabulary learning that occurs in Story Friends 
may be one important component of the word learning pro¬ 
cess. We ultimately need to evaluate the long-term effects 
of vocabulary instruction on later reading fluency and 
comprehension. 

Our project is unique because we attempted to teach 
challenging vocabulary words, measure definitional vo¬ 
cabulary knowledge, and work with at-risk preschool chil¬ 
dren in classroom settings using existing educational staff 
over the course of an entire school year. Our findings con¬ 
tribute to an existing body of research on explicit vocabu¬ 
lary instruction for preschool children and extend this work 
in important ways. Shared storybook reading has been 
a popular context for embedded vocabulary instruction 
(Beck & McKeown, 2007; Coyne et ak, 2007; Justice et ak, 
2005); however, few studies to date have examined the 
use of automated methods of delivery. This important 
innovation provides a method for implementation in class¬ 
room settings without placing excessive demands (e.g., ex¬ 
tensive professional development, additional personnel) 
on existing resources. Given the well-documented chal¬ 
lenges of high-fidelity implementation in authentic educa¬ 
tional settings, there is a clear need for such intervention 
approaches. 
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Appendix 1. 

Sample Scripts for Embedded Vocabulary and Story Question Lessons 


Lesson 


Sample script 


Notes 


Embedded vocabulary Wow! The Jungle Friends are thrilled! They are 

excited to go to the carnival. Thrilled. Say 
thrilled, [pause] Thrilled means excited. Tell me, 
what word means excited? [pause] Thrilled! 
Good work! When are you thrlHed? [pause] 
What about when you get a present. Or your 
friends come over to play? I bet that makes you 
feel excited. Now lift the flap. [Picture of two 
young boys with party hats and a birthday 
cake] Look! These boys are at a birthday party. 
They are thrilledi They are excited. Tell me, 
what does thrilled mean? [pause] Excited! 

That’s right. 


Embedded story question Do you think Pablo Porcupine’s friends will listen 

to him the next time they play together? [pause] 
Yes, because they learned that Pablo could 
help them! Pablo was quiet, but he knew how 
to get to the carnival. I bet they’ll listen to Pablo 
from now on! 


In each embedded vocabulary lesson, children 
were provided multiple opportunities to 
respond, indicated above by pauses in the 
script. Response opportunities always included 
repetition of the word, saying the word in 
response to the definition, and providing the 
definition. In some lessons, children could lift a 
flap to see a picture related to the target word. 
In other lessons, children had the opportunity 
to provide a gesture or facial expression related 
to the target word. Positive feedback was 
provided by the narrator for many of the 
response opportunities. Because the scripts 
were prerecorded, all children received the 
same opportunities to respond as well as the 
same positive feedback. 

In each embedded story question lesson, children 
were provided with a single opportunity to 
respond, indicated above by pauses in the 
script. After the response opportunity, the 
narrator provided a model of an appropriate 
answer as well as a brief explanation of the 
answer. 


Note. Sample scripts are in press by Paul H. Brookes Publishing Co. and are printed with permission of the publisher. Copyright © Paul H. 
Brookes Publishing Co. 
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Appendix 2. 

Intervention and Assessment Timeline 


Week 

Story 

Measures 

Series 1: 
1 

Jungle Friends 

Introductory: Meet the Jungle Friends 

UVT 1 pretest and ASC 

2 

Ellie’s First Day 

Jungle Friends concept word pretest 
Listening center observations 

3 

Leo’s Brave Face 

Listening center observations 

4 

Jungle Friends Go to the Beach 

Listening center observations 

5 

Review: Jungle Friends Bake a Cake 

UVT 1 posttest 

6 

Marquez Monkeys Around 

UVT 2 pretest and ASC 

Listening center observations 

7 

If Elephants Could Fly 

Listening center observations 

8 

Leo Loses His Roar 

Listening center observations 

9 

Review: Jungle Friends Go to the Park 

UVT 2 posttest 

10 

Ellie Gets Stuck 

UVT 3 pretest and ASC 

Listening center observations 

11 

A New Jungle Friend 

Listening center observations 

12 

Marquez’s Backwards Day 

Listening center observations 

13 

Review: Jungle Friends Swinging Through the Vines 

UVT 3 posttest and ASC 

Series 2: 
14 

Forest Friends 

Introductory: Meet the Forest Friends 

Jungle Friends concept word posttest 

UVT 4 pretest 

15 

Pablo’s Prickly Problem 

Forest Friends concept word pretest 
Listening center observations 

16 

Suki’s Sleepover Surprise 

Listening center observations 

17 

Bobby’s EmBEAPassing Visit 

Listening center observations 

18 

Review: Suki Squirrel Goes Swimming 

UVT 4 posttest 

19 

Fae’s Nose Knows 

UVT 5 pretest and ASC 

Listening center observations 

20 

Snow Day for Fae 

Listening center observations 

21 

Where is Bobby Bear? 

Listening center observations 

22 

Review: Pablo Packs a Picnic 

UVT 5 posttest 

23 

Pablo’s Map Matters 

UVT 6 pretest and ASC 

Listening center observations 

24 

Fae’s Smelly Situation 

Listening center observations 

25 

Suki’s Selfish Saturday 

Listening center observations 

26 

Review: Forest Friends Go to the Library 

UVT 6 posttest and ASC 



Forest Friends concept word posttest 


Note. Introductory and review books are labeled; all other books are instructional books. Children listened 
to instructional books three times and the review book once. Children listened to books in small-group 
listening centers in their classrooms. Assessments were administered individually by research staff. 

UVT = Unit Vocabulary Test; ASC = Assessment of Story Comprehension. 
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